Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Computer Engineering

   

An Empirical Study of Redundant Dependencies in Open-Source Java Projects

  

  • Published:2025-10-27

开源Java项目中冗余依赖的实证研究

Abstract: Redundant dependencies in software projects can lead to increased build size, performance overhead, and long-term maintenance burden. Although existing studies have investigated redundant dependencies in the Maven ecosystem, there remains a lack of analysis regarding their distribution across different dependency scopes (e.g., compile and test), their evolutionary patterns, and their impact on project popularity. To address this gap, we select 2,214 Java Maven open-source projects from GitHub as our study subjects. We employ a mvn command to identify dependencies that are declared but not actually used, and conduct a quantitative analysis of redundancy ratios based on their scopes. Furthermore, we apply the Mann-Kendall non-parametric trend test on 3,817 historical versions from 698 projects to identify trends in the evolution of redundant dependencies. To assess the relationship between redundant dependencies and project popularity or community activity, we construct five GitHub-based popularity and activity metrics, including star growth rate, fork growth rate, and issue closing rate, and perform Pearson correlation analysis. Experimental results show that redundant dependencies are primarily concentrated in the compile and test scopes, with median redundancy ratios of 33.33% and 30.00%, respectively. In terms of evolutionary trends, 48.1% of the projects maintained a stable redundancy ratio, 36.2% exhibited fluctuations, and a small proportion showed an increasing or decreasing trend. In the correlation analysis, only the issue closing rate shows a significantly weak negative correlation with the redundancy ratio. These findings provide developers with a detailed perspective on dependency management and can help optimize project configurations and improve software maintainability.

摘要: 软件项目中的冗余依赖可能导致构建体积增加、性能开销上升以及维护负担加重。尽管已有研究关注Maven生态系统中的冗余依赖问题,但对冗余依赖在不同依赖作用域(如编译和测试)中的分布特征、演化模式及其对项目受欢迎程度等的影响仍缺乏分析。为此,选取GitHub平台上2,214个Java Maven开源项目作为研究对象,采用mvn命令识别各项目中引入但未被实际使用的冗余依赖,并结合依赖的作用域信息进行冗余比例的定量分析。接着,在698个项目的3,817个历史版本中采用Mann-Kendall非参数趋势检验方法,识别冗余依赖的演变趋势。此外,为评估冗余依赖与项目受欢迎程度和社区活跃度之间的关系,构建包括Star增长率、Fork增长率、Issue关闭率等在内的五种GitHub流行度和活跃度指标,并进行皮尔逊相关性分析。实验结果显示,冗余依赖主要分布在编译和测试作用域,其冗余比例中位数分别为33.33%和30.00%;在演化趋势上,48.1%的项目冗余比例保持稳定,36.2%的项目冗余比例波动,少数呈现增加或减少趋势;在相关性分析中,仅Issue关闭率与冗余依赖比例表现出显著弱负相关性。研究结果可为开发者提供细致的依赖管理视角,助力优化项目配置与提升软件可维护性。